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IMAGE RECOGNITION 

This invention relates to the recognition of images, and is concerned^ 
particularly although not exclusively, with the recognition of natural images. 

5 By "natural image" is meant an image of an object that occurs naturally — 

for example, an optical image such as a photograph, as -well as images of other 
wavelengths - such as x-ray and infra-red, by way of example. The natural image 
may be recorded and/or subsequently processed by digital means, but is in contrast 
to an image — or image data — that ik generated or synthesised by computer or other 
10 artificial means. 

* « 

The recognition of natural images can be desirable for many reasons. For :; " 
example, distinctive landscapes and buildings can be recognised, to assist in the 
identification of geographical locations. The recognition of human faces can be 

* 

useful for identification and security purposes. The recognition of valuable animals 
15 such as racehorses may be very useful for identification purposes. 

In this specification, we present in preferred embodiments of the invention 
a new approach to face recognition using a variety of three-dimensional facial surface 
representations generated from a University of York (UofY) /Cybula 3D Face 
Database. By applying principal component analysis to three-dimensional surface 

« * 

2 0 structure, we show that high levels of accuracy can be achieved when performing 
recognition on a largp database of 3D face models, captured under conditions that 
present typical difficulties to the more conventional two-dimensional approaches. 
Results ace presented as false acceptance rates and false rejection rates, taking the 
equal error rate as a single comparative value. We identify the most effective surface 
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representations and distance metrics to be used in such application areas as security, 
surveillance, data compression and archive searching. 

i 

Despite significant advances in face recognition technology, it has yet to 
achieve the levels of accuracy required for many commercial and industrial 

5 applications. Although some face recognition systems proclaim extremely low 
error rates in the test environment, these figures often increase when exposed to 
a real wodd scenario. The reasons for these high error rates stem from a 
number of well-known sub-problems that have never been fully solved. Face 
recognition systems axe highly sensitive to the environmental, circumstances 
1 0 under which images are captured Variation in lighting conditions, facial 

expression and orientation can all significantly increase error rates, making it 
necessary to maintain consistent image capture conditions between query and 
gallery images for the system to function adequately. However, this approach 
eliminates some of the key advantages offered by face recognition: a passive 

15 biometric in the sense that it does not require subject co-operation. 



In preferred embodiments, we use 3D face models that eliminate some 
of the problems commonly associated with face recognition. By relying purely 
on geometric shape, rather than the colour and texture information available in 
. two-dimensional images, we render the system invariant to lighting conditions, at 
20 the expense of loosing the distinguishing features only available in colour and 
texture data. In addition, the ability to rotate a facial structure in three- 
dimensional space allows for compensation of variations in pose, aiding those 
methods requiring alignment prior to recognition. 

Here we use facial surface data for the first time, taken from 3D face 
25 models, as a substitute for the more familiar two-dimensional images. We take a 
well-known method of face recognition, namely the eigenface approach 
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described by Turk and Pentland [l s 2] and adapt it for use on the new three- 
dimensional data. We identify the most effective methods of recognising feces 
using three-dimensional surface structure. 

In order to best this method of face recognition, we have used a large 
5 database of 3D face models. However^ until recently, methods of 3D model 
generation have usually employed die use of laser scanning equipment Such 
systems (although highly accurate) are often slow, requiring the subject to remain 
perfectly stilL Stereo vision techniques are able to capture at a faster rate without 
using lasers, but feature correlation requires regions of contrast and stable local >■ 
10 texture; something that cheeks and forehead distinctly lack. For these reasons, f Q 

• m 

three-dimensional face recognition has remained relatively unexplored, when 
compared to the wealth of research focusing on two-dimensional JEa^e • 
recognition. Although some investigations have experimented with 3D data [3, 
4, 5, 6], they have had to rely on small tests sets of 3D face models or used 

15 generic face models to enhance two-dimensional images prior to recognition [7, 
8, 9]. However, this research demonstrates that the use of three-dimensional 
information has the potential to improve face recognition well beyond the 
current state of the art With the emergence of new three-dimensional capture 
equipment, population of a laige 3D face database has now become viable and 

2 0 being undertaken at the UofY /Cybula as p art of a project facilitating research 
into three-dimensional face recognition technology. 

Previous research has explored the possibiEties offered by three- 
dimensional geometric structure to perform face recognition. To date, the 
research has focused on two-dimensional images^ although some have attempted 
25 to use a-priori knowledge of facial structure to enhance these existing two- 
dimensional approaches. For example, Zhao and Cheilappa (TJ use a generic 3D 
face model to normalise facial orientation and lighting direction in two- 

mmimmmmmmmm 
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dimensional images. Using estimations of light source direction and pose, the 
3D face model is aligned with the two-dimensional face image and used to 
project a prototype image of the frontal pose equivalent, pdor to recognition by 
linear discriminant analysis. Though mis approach, recognition accuracy on me 
5 test set is increased from approximately 81% (correct match within rank of 25) 
to 100%. Similar results are witnessed in the F.ace Recognition Vendor Test [10], 
showing that pose correction using Romdhani, Blanz and Vetted 3D 
morphable model technique [8] reduces error rates when applied to the FERET 

■ 

database [11]. 

* 

1 0 Blanz, Romdhani and Vetter [?] take a comparable approach, using a 

» 

3D morphable face -model to aid in identification of 2D face images. Beginning 
with an initial estimate of lighting direction and face shape, Romdhani et al 
iteratively alters shape and texture parameters of the morphable face model, 
minimising difference to the two-dimensional image. These parameters are then 
15 taken as features for identification. 

Although the methods discussed show that knowledge of three- 
dimensional face shape can improve two-dimensional face recognition systems 
by improving normalisation, none of the methods mentioned so far use actual 
geometric structure to perform recognition. Whereas Beumier and Acheroy [3] 
2 0 make direct use of such information, generating 3D face models using an . 
approach based on structured light deformation. Beumier and Acheroy test 
methods of matching 3D face models; few of which were successful. 
Curvature analysis proved ineffective, and feature extraction was not robust 
igh to provide accurate recognition. However, Beumier and Acheroy were 
25 able to achieve reasonable error rates using curvature values of vertical surface 

■ 

profiles. Verification teste carried out on a database of 30 people produced equal 



■various 



enouj 
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error rates between 7.25% and 9% on the automatically aligned surfaces and 
between 6.25% and 9.5% when manual alignment was used. 



' Chua et al [6] take a different approach, applying non-rigid surface 
recognition techniques to the face structure. An attempt is made to identify and 
5 extract rigid areas of facial surfaces, creating a system invariant to facial 

expression. The characteristic used to identify these rigid areas and ultimately 
, distinguish between faces is the point signature, which describes depth values 
surrounding local regions of specific points on the facial surface. The similarity 
of two face models is computed by identifying and comparing a set of unique 
10 point signatures for each face. Identification tests show that the probe image is 
identified correcdy for all people when applied to a test set of 30 depth maps of 
6 different people. 

Coombes et al [12] investigate a method based on differential geometry. 
Curvature analysis is applied to a depth map of the facial surface; segmenting 
15 the surface into one of eight fundamental types: peak, ridge, saddle ridge, 

minimal, pit, valley, saddle valley and flat Coombes et al suggest that two faces 
may be distinguished by comparing which curve types classification of 

* 

correlating regions. A quantitative analysis of the average male and female face 
structure shows distinct differences in chin, nose, forehead shape and cheek 
2 0 bone position between faces of different gender. 



-A- 



Another method, proposed by Gordon [5J, incorporates feature 
localisation. Using both depth and curvature information extracted from three 
dimensional face models, Gordon identifies a number of facial features, from 
which a set of measurements are taken, including head width, numerous nose 
25 dimensions and curvatures, distance between the eyes and eye width. These 
features are evaluated using fisher's linear discriminant, determining the 
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discriminating ability of each individual feature. Gordons findings show that 
the head width and nose location are particulady important features for 
recognition, whereas eye widths and nose curvatures are less useful. Recognition 
is performed by means of a simple euclidean distance measure in feature space. 
5 Several combinations of features are tested using a datab ase of 24 facial surfaces 
taken from 8 different people, producing results ranging from 70.8% to 100% 
correct matches. 

According to one aspect of the present invention, there is provided a 
method of recognising a natural image, comprising the steps of generating a 
1 0 depth map of the image, generating eigenvectors and eigenvalues from the depth 
map, and recognising the image by those eigenvectors and eigenvalues. 

In another aspect, the invention provides a device for recognising a 
natural image, comprising means for generating a depth map of the image, 
means for generating eigenvectors and eigenvalues from the depth map, and 
1 5 means for recognising the image by those eigenvectors and eigenvalues. 



A method or device as above may further include any or all of the 
features or method steps disclosed in this specification (including the drawings), 
which may be combined in any combination, except combinations where at least 
some of such features and/or steps are mutually exclusive. 

■ 

2 0 For a better understanding of the invention, and to show how 

. embodiments of the same may be carried into effect, reference will now be 
made, by way of example, to the accompanying diagrammatic drawings, in 



which: 



Figure 1 shows examples of face models taken from a 3D face database; 



P690GB - Specification as filed - 09 October '2003 



Stanleys 



10/9/2003 5:14 PM FROM: 01481 B24422 Stanley Stanleys TO: 01633 814 4 44 PAGE 5 DID OT 019 



- 7 - 

Figure 2 shows orientation of a raw 3D face model (left) to a frontal 
pose (middle) and facial surface deptih map (right); 

Figure 3 shows an average depth map (left most) and first eight 
eigerisurfaces; 

5 Figure 4 is a graph showing false acceptance rate -and false rejection rate 

for baseline 3D face recognition systems using facial surface depth maps and a 
range of distance metrics; 

Figure 5 is a diagram of verification test procedure; 

* 

Figure 6 is a graph showing false acceptance rate and false rejection rate 
10 for 3D face recognition systems using optimum surface representations and 
distance metrics; 

Figure 7 is a chart to show Equal error rates of 3D face recognition 
systems using a variety of surface representations and distance metrics; and 

Figure 8 shows brief descriptions of surface representations with 

■ 

15 convolution kernels used. 

As mentioned previously, tihere is little three-dimensional face data 
publicly available at present and nothing towards the magnitude of data required 
for development and testing of three-dimensional face recognition systems. 
Therefore, we have collected a new database of 3D face models, collected at 
2 0 UofY/Cybula as part of an ongoing project to p rbvide a publicly available 3D 
Face Database of over 1000 people for face recognition research. The 3D face 
models are generated using a stereo vision technique enhanced by light 
projection to provide a higher density of features. Each face model requires a 
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single shot taken with a 3D camera/ftom which the model is generated in sub- 
second p rocessing time. 

For the purpose of these evaluation, we use a subset of the 3D face 
database, acquired during preliminary data acquisition sessions. This set consists 
5 of 330 face models taken from 100 different people under the conditions shown 

in Fig. 1- 

During capture, no effort was made to control lighting conditions. In 
order to generate face models at various head orientations, subjects were asked 
to face reference points positioned roughly 45 above and below tiie camera, but 
10 no effortwas made to enforce aprecise angle of orientation. Examples of the 

• 9 

face models generated for each person are shown in Fig. 1. 

3D face models are stored in the OBJ file format (a common 
representation of 3D data) and orientated to face directly forwards using our 
orientation normalisation algorithm (not described here) before being converted 
15 into depth maps. The database is then separated into two disjoint sets: the 

training set consisting of 40 depth maps of type 01 (see Fig. 1) and a test set of 
the remaining 290 depth maps, consisting of all capture conditions shown in Fig. 
1. Both the teaining set and test set contain subjects of various race, age and 
gender and nobody is present in. both the training and test sets. 

20 In. previous work we have shown that the use of image processing 

techniques can significantly reduce error rates of two-dimensional face 
recognition methods, by removing unwanted features caused by environmental 
capture conditions. Much of this environmental influence is not present in the 
' 3D face models, but pre-processing may still aid recognition by making 

25 distinguishing features more explicit In this section we describe a number of 
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surface representations, which may affect recognition error rates. These surfaces 
are derived by pre-processing of depth maps, prior to both training and test 
procedures, as shown in Fig. 4. 

In our approach we define a C 3D surface space* by application of 
5 principal component analysis to the training set of facial surfaces, taking a similar 
approach to that described by Turk and Pentland [1] and used in previous 
investigations. 

: ■ 

« • 

Consider our training set of facial surfaces, stored as orientation 
normalised 60x105 depth maps. Edch of these depth maps can be represented 

10 as a vector of 6300 elements, describing a single point within the 6300 

dimensional space of all possible depth maps. - What? s more, faces with a similar 
geometric structure should occupy points in a comparatively localised region of 
this high dimensional space. Continuing this idea, we assume that different 
depth maps of the same face prdjedt to nearby points in space and depth maps 

15 of different faces project to far apart points. Ideally, we wish to extract the 

r 

region of this space that contains facial surfaces, reduce the dimensionality to a 
practical value, while maximising the spread of facial surfaces within the depth 
map subspace. 

In order to define a space with the properties mentioned above, we 
2 0 apply principal component analysis to the training set of M depth maps (in our 
case M = 40) {Ti, T2, I\ . . . Tm} , computing the covariance matrix, 



0) 
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Where cfteis the difference of the «th depth map from the average yr. 
Eigenvectors and eigenvalues of the covariance matrix are calculated using 
standard linear methods. The resultant eigenvectors describe a set of axes within 
the depth map space, along which most variance occurs within the training set 
5 and the corresponding eigenvalues rep resent the degree of this variance along 
each axis. The M eigenvectors are sorted in order of descending eigenvalues and 
the greatest eigenvectors On our system AT = 40) are 'chosen to represent 
• surface space. The effect is that we have reduced the dimensionality of the space 
.' to M\7et maintained a high level of variance between facial surfaces throughout 
10 the depth map subspace. h 

M 

We term each eigenvector an eigensurface, containing 6300 elements 
(the number of depth values in the original depth maps) which can be displayed 
as range images of the facial surface' principal components, shown in Fig. 3. 

l 

* * 

Once surface space has been defined we project any face into surface 
1 5 space by a simple matrix multiplication using the eigenvectors calculated from 
the covariance matrix in equation 1: 



■ 

where ut is the kth eigenvector and tok is the kth weight in me vector Q T 
= [toi,to2, to 3 , . . • com*]. The M s coefficients represent the contribution of each 
respective eigensurface to the projected depth map. The vector Q is taken as the 
2 0 'face-key 4 representing a person's facial structure in surface space and compared 
by either euclidean or cosine distance metrics. 
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■ 

Li addition, we can also divide each face-key by its respective 
eigenvalues, prior to distance calculation, removing any inherent dimensional 
bias and introducing two supplementary metric?, the mahalanobis distance and 
weighted cosine distance. An acceptance (the two fecial- surfaces match) or 
5 rejection (the two surfaces do not match) is determined by applying a threshold 
to the calculated distance. Any comparison producing a distance below the 
threshold is considered an acceptance. In order to evaluate the effectiveness of 

• * 

the face recognition methods, we carry out 41,905 verification operations on the 

• ■ *■ 

test set of 290 facial surfaces, computing the error rates produced (see Fig. 4). 
1 0 Each surface in the test set is compared with every other surface, no image is 
compared with itself and each pair is compared only once (the relationship is 
symmetric). , 

False acceptance rates and false rejection rates are calculated as the 
percentage of incorrect acceptances and incorrect rejections after applying the 
15 threshold. Applying a range of thresholds produces a series of FAR, FRR pairs, 
which are plotted on a graph as shown for our benchmark system in Fig. 5, The 
equal error rate can be seen as the point where FAR equals ERR. 

We now present the results gathered from testing the three-dimensional 
face recognition methods on the test set of 290 facial surfaces. The results are 
20 presented by error curves of FAR vs. FKR and bar charts of EERs. Fig. 5 
shows the error curve calculated for the baseline system (facial surface depth 
maps) using the four distance measures described in section 6. 
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The results clearly show that diving by eigenvalues to normalise vector 
dimensions prior to calculating distance values significantly decreases error rates 
foe both the euclidean and cosine distance measures, with the mahalanobis 
distance providing the lowest EER. The same four curves were produced for all 

o - 

5 surface representations described in section 4 and the EERs taken as a single 
comparative, value, presented in Fig. 6. 

It is clear from the EERs shown in Fig. 6, that surface gradient 
representations provide the most distinguishing information for face recognition. 
The horizontal derivatives give the lowest error rates of all, using the weighted 
10 cosine distance metric. In feet, the weighted cosine distance returns the lowest 

error rates for the majority of surface representations, except for a few cases f, 
, when the. weighted cosine EER is particularly high. However, which is the most 

* 

effective surface representation seems to be dependent on the distance metric 
used for comparison (see Fig. 7), except for curvature representations, which are 
1 5 generally less distinguishing, regardless of the distance metric used. 

* 

Due to the orthogonal nature of the most effective surface 
representations (horizontal and vertical derivatives), we hypothesize that 

* 

combing these representations will reduce error rates further. Therefore, in 
addition to the systems shown in Fig. 6, we test a number of system 
20 * combinations by concatenating the face-keys projected from numerous surface 
spaces, attempting to utilise distinguishing features from multiple surface 
representations. The results for which are shown in Table i, calculated by 
applying the weighted cosine distance measure to the extended face-keys 
combinations. 
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Table 1. Equal error rates of surface space combination systems 



Surface Space Combinations 




Sobcl X, Sobel Y> Horizontal gradient large, vertical 
gradient 


12.1% 


Laplacian, Horizontal gradient large, vertical gradient . 
large 


11.6% 

* 


Laplacian, Sobel X, Horizontal gradient, Horizontal . 
gradient large* vertical gradient, vertical gradient large 


11.4% 



We have shown that a well-known two-dimensional face recognition 
method can be adapted for use on three -dimensional face models. Tests have 
5 been carried out on a large database of three-dimensional facial surfaces, • • 

captured under conditions that present typical .difficulties when performing 

recognition. The error rates produced from baseline three-dimensioned systems 

. -. . _ * * . — 

are significantly lower that those gathered in similar experiments using two- 

• —. 

dimensional images. It is clear that three-dimensional face recognition has * ; , 

10 distinct advantages over conventional two-dimensional approaches. 

Experimenting with a number of surface representations., we have 
discovered that facial surface gradient is more effective for recognition than 
depth and curvature representations. In particular, horizontal gradients produce 
the lowest error rates. This seems to indicate that horizontal derivatives provide 
15 more discriminatory information than vertical profiles. Another advantage is 
that gradients axe likely to be more robust to inaccuracies in the alignment 
procedure, as the derivatives will be invariant to translations along the Z-axis. 

Curvature representations do not seem to contain as much 
discriminatory information as the other surface representations. We find this 
20 surprising, as second derivatives should be less sensitive to inaccuracies of 

orientation and translation along the Z-axis. However, this could be a reflection 

mmmmmmmmmmm 
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of inadequate 3D model resolution, which could be the cause o£ the noisy 
curvature images in Figure 8, 

Testing three distance metrics has shown that the choice of method.for 
face-key comparisons has a considerable affect on the resulting error rates. It is 
5 also evident that dividing each face-key by its respective eigenvalues, normalising 
dimensional distribution, usually improves results for both euclidean and cosine 
distances. This indicates that dimensional distribution is not necessarily 
proportional to discriminating ability and that surface space as a whole becomes 
: more discriminative when distributed evenly. However, this is not the case for 
1 0 some of surface representations with higher EERs, suggesting that these 

representations incorporate only a few dominant useful components, which ■* ^ 

become-masked when normalised with the majority of less discriminatory 

components. 

The weighted cosine distance produces the lowest error rates for the 
15 majority of surface representations, including the optimum system. This metric 
has also provided the means to combine multiple face-keys, in an attempt to 
utilise advantages offered by numerous surfaces representations, reducing error 
rates further. 

We have managed to reduce error rates from 17.8% EER, obtained 
20 using the initial depth maps, to an EER of 12.1% when the most effective 

surface representations were combined into a single system. These results are 
substantially lower than the best two-dimensional systems tested under similar 
circumstances in our previous investigations, proving that geometric face 
structure is useful for recognition when used independently from colour and 
25 texture information and capable of achieving high levels of accuracy. Given that 
the data capture method produces face models invariant to lighting conditions 
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and provides the ability to recognise faces regardless of pose, makes this system 
particulady attractive for use in security and surveillance applications. 

In this specification, the verb "comprise" has its normal dictionary 
meaning, to denote non -exclusive inclusion. That is, use of the word "comprise" 
5 (of any of its derivatives) to include one feature or more, does not exclude the 
possibility of also including further features. 

The readers attention is directed to all and any priority documents 
identified in connection with this application arid to all and any papers and & , 

documents which are filed concurrently with or previous to this specification in 
y l 0 connection with this application and which are open to public inspection with *=, 
this specification, and the contents of all such p t apers and documents are*, 
incorporated herein by. reference. 

All of the features disclosed in this, specification (including any ; 
accompanying claims, abstract and drawings), 'and/ or all of the steps of any 
15 method or process so disclosed, may be combined in any combination, except 
combinations where at least some of such features and/or steps are mutually 
exclusive. 

* 

Each feature disclosed in this specification (including any accompanying 
claims, abstract and drawings), may be replaced by alternative features serving 
20 the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, 
unless expressly stated otherwise, each feature disclosed is one example only of a 
generic series of equivalent or similar features. 

The invention is not restricted to the details of the foregoing 
embodiment^). The invention extends to any novel one, or any novel 
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combination, of the features disclosed in this specification (including any 
accompanying claims, abstract and drawings), or to any novel one, or any novel 
combination, of the steps of any method or process so disclosed. 



3 
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Big- 1. Example face models taken from a 3D. face database 




Fig. 2. Orientation of a raw 3D face model (left) to a frontal pose (middle) and facial 

surface depth map (right) 
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Fig. 3. Average depth map (left most) and first eight eigensuifaces 




False Rejection Rate / % 



Fig. 4. Baseline 3D face recognition systems using facial surface depth maps and a 

range of distance metrics 
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Fig. 5. Diagram of verification test procedure 
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Fig. 6. Error rales of 3D face recognition systems using optimum surface 

representations and distance metrics. 
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Fig. 7. Equal error rates of 3D face recognition systems using a variety 

of surface representations and distance metrics 
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Figure 8. Brief descriptions of surface representations 
with the convolution kernels used. 
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